Universidade Federal de Viçosa

Programa de pós-graduação em Genética e Melhoramento

Departamento de Biologia Geral




A new look on the genotype-by-environment interaction: enviromics and probabilistic models



Prof. Dr. Kaio Olimpio das Graças Dias

Dr. Saulo Fabrício da Silva Chaves

Background

Concepts and biological aspects

Data example

data = read.csv('https://raw.githubusercontent.com/Kaio-Olimpio/Probability-for-GEI/master/maize_dataset.csv', 
                stringsAsFactors = TRUE)
data = transform(data, Rep = as.factor(Rep), Block = as.factor(Block))

ngen = nlevels(data$Hybrid)
nrep = nlevels(data$Rep)
nblock = nlevels(data$Block)
nloc = nlevels(data$Location)
  • 36 maize hybrids
  • 16 locations
  • Lattice: 12 blocks in 2 replicates

data |> ggplot(aes(y = reorder(Hybrid,GY), x = reorder(Location,GY), fill = GY)) + 
  geom_tile() + theme_minimal()+
  scale_fill_viridis_c(option = "turbo") + 
  labs(y = 'Hybrid', x = 'Location')

The statistical analysis of MET

  • ANOVA-based methods

    • Shukla 1972 - Ecovalence (Wricke 1955)
  • Regression-based methods

    • Finlay & Wilkinson 1963 - Eberhart & Russel 1966
  • Non-parametric/risk-based methods

    • Lin & Binns 1988 - Eskride 1989 - Annicchiarico 1992
  • Multivariate methods

    • AMMI (Zobel et al 1988) - GGE-Biplot (Yan et al 2020)
  • Mixed Models

    • FA Models (Piepho 1997, Smith et al 2011) - Reaction norms (Jarquin et al 2014)
  • Bayesian Models

    • AMMI (Crossa et al. 2011) - Probabilistic Models (Dias et al 2022)

Bayesian Inference

Bayesian Inference


  • Bayesian inference is reallocation of credibility across possibilities

  • The possibilities to which we assign credibility (probability) are the parameter values within meaningful mathematical models

Probabilities

  • A Muerder Mystery (Credits: Christopher Bishop)
  • A fiendish murder has been committed
    • Who did it?
  • There are two suspects:
    • The Butler
    • The cook

  • There are three possible murder weapons:
    • a butcher’s Knife
    • a Pistol
    • a fireplace Poker

Prior Distribution

  • Butler has served family well for many years
  • Cook hired recently, rumors of dodgy history

\[ P(Culprint = Butler) = 20\% \]

\[ P(Culprint = Cook) = 80\% \] \[ Culprint = (Butler, Cook) \]

Conditional Distribution

  • Butler is ex-army, keeps a gun in a locked drawer
  • Cook has access to lots of knives
  • Butler is older and getting frail

\[ P(Weapon | Culprint) \]

Joint Distribution

  • What is the probability that the Cook committed the murder using the Pistol?

\[ P(Culprint = Cook) = 80\% \]

\[ P(Weapon = Pistol | Culprint = Cook) = 5\% \]

\[ P(Weapon = Pistol, Culprint = Cook) = 80\% \times 5\% = 4\% \]

  • Likewise for the other five combinations of Culprit and Weapon

\[ P(Weapon, Culprint) = P(Weapon|Culprint) \times P(Culprint) \]

PRODUCT RULE

\[ P(x, y) = P(y|x) \times p(x) \]

Generative Viewpoint

Marginal Distribution of Culprit

SUM RULE

\[ P(x) = \sum_{y}^{} P(x,y) \]

Posterior Distribution

  • We discover a Pistol at the scene of the crime

  • Cook = 20%

  • Butler = 80%

Generative viewpoint

Bayes’ theorem

\[ P(x,y) = P(y|x)P(x) = P(x|y)P(y) \]

\[ \frac{P(y|x)P(x)}{P(x)} = \frac{P(x|y)P(y)}{P(x)} \]

\[ P(y|x) = \frac{P(x|y)P(y)}{P(x)} \]

\[ P(\theta|y) = \frac{P(y|\theta)P(\theta)}{P(y)} \]

  • \(P(\theta)\) = PRIOR belief before making a particular obs.

  • \(P(\theta|y)\) = POSTERIOR belief after making the obs.

  • \(P(y|\theta)\) = LIKELIHOOD

  • Posterior is the prior for the next observation
    • Intrinsically incremental

Linear Regression

\[ y = \alpha + \beta x + \epsilon \]


\[\epsilon \sim N(0, \sigma) \]


\[ P(\alpha, \beta , \sigma |y, x) \propto P(y|x, \alpha , \beta, \sigma) P(\alpha) P(\beta) P(\sigma) P(x) \]

Probabilistic Models

  • Samples from the posterior distributions: a simulated trial
  • Let \(\Omega\) be the subset of selected candidates
  • The size of \(\Omega\) is defined by the selection intensity
  • After each sampling, ask:
    • Which candidates are the top performers?
    • Which candidates are the most stable?
    • Is candidate x better than candidate y?


  • Fitting a Bayesian model

\[ y_{jklm} \sim N(E[y_{jklm}], \sigma)\]

\[ E[y_{jklm}] = \mu + g_{j} + e_{k} + r_{m(k)} + b_{l(mr)} + (ge)_{jk} \]

\[ \mu \sim N (0, S^{[\mu]}) \]

\[ g_{j} \sim N (0, S^{[g]}) \]

\[ S^{\mu} \sim HalfCauchy(0, \phi) \]


  • Probability of superior performance

  • Which candidates are the top performers?

  • What is the risk of recommending a given candidate (performance)?

\[ Pr(\hat{g}_j \in \Omega \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_j^{(s)} \in \Omega \vert y) \]

\[ \begin{cases} \hat{g}_j \in \Omega \rightarrow I(\hat{g}_j^{(s)} \in \Omega \vert y) = 1 \\ \hat{g}_j \notin \Omega \rightarrow I(\hat{g}_j^{(s)} \in \Omega \vert y) = 0 \end{cases} \]

\(S = (\hat{g}_j \in \Omega) + (\hat{g}_j \notin \Omega)\)

  • Selection for increasing or decreasing the trait value
  • Which candidates are the most stable?
  • What is the risk of recommending a given candidate (stability)?

\[ Pr\left[ Var(\widehat{ge}_{jk}) \in \Omega \vert y \right] = \frac{1}{S} \sum_{s=1}^S I \left[ Var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y \right] \]

\[ \begin{cases} Var(\widehat{ge}_{jk}) \in \Omega \rightarrow I \left[ Var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y \right] = 1 \\ Var(\widehat{ge}_{jk}) \notin \Omega \rightarrow I \left[ Var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y \right] = 0 \end{cases} \]

  • The lower, the better: stability as invariance
  • Pairwise probability of superior performance/stability

  • Is candidate x better than candidate y?

  • What is the probability that x performs better than y in the TPE?

\[ Pr(\hat{g}_{j} > \hat{g}_{j^\prime} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{j}^{(s)} > \hat{g}_{j^\prime}^{(s)} \vert y) \]

\[ \begin{cases} \hat{g}_{j} > \hat{g}_{j^\prime} \rightarrow I(\hat{g}_{j}^{(s)} > \hat{g}_{j^\prime}^{(s)} \vert y) = 1 \\ \hat{g}_{j} < \hat{g}_{j^\prime} \rightarrow I(\hat{g}_{j}^{(s)} > \hat{g}_{j^\prime}^{(s)} \vert y) = 0 \end{cases} \]

  • Same comparisons using \(Var(\widehat{ge}_{jk})\)
  • Marginal probabilities and conditional probabilities
  • Joint probability of superior performance and stability

\[ Pr(\hat{g}_j \in \Omega \vert y) \times Pr (Var(\widehat{ge}_{jk}) \in \Omega \vert y)\]

  • Diagnostics

  • \(\hat{R} \rightarrow\) mean potential scale reduction factor (\(\sim 1\))
  • \(WAIC2 \rightarrow\) the lower, the better
  • Bayesian p-values \(\rightarrow \sim 0.5\)


  • Performance/stability

  • Pairwise comparison


  • Probabilities within Environments


Tip

  • ProbBreed is a free package, available from CRAN
  • More information is available at the package’s vignette

References

Annicchiarico, P. (1992). Cultivar adaptation and recommendation from alfalfa trials in Northern Italy. Journal of Genetics and Breeding (Italy), 46, 269–278.
Araújo, M. S., Chaves, S. F., Dias, L. A., Ferreira, F. M., Pereira, G. R., Bezerra, A. R., Alves, R. S., Heinemann, A. B., Breseghello, F., Carneiro, P., et al. (2024). GIS-FA: An approach to integrating thematic maps, factor-analytic, and envirotyping for cultivar targeting. Theoretical and Applied Genetics, 137(4), 1–23.
Chaves, S. F. S., Krause, M. D., Dias, L. A. S., Garcia, A. A. F., & Dias, K. O. G. (2024). ProbBreed: A novel tool for calculating the risk of cultivar recommendation in multienvironment trials. G3 GenesGenomesGenetics, 14(3), jkae013. https://doi.org/10.1093/g3journal/jkae013
Dias, K. O. D. G., Gezan, S. A., Guimarães, C. T., Nazarian, A., Costa e Silva, L. da, Parentoni, S. N., Oliveira Guimarães, P. E. de, Oliveira Anoni, C. de, Pádua, J. M. V., Oliveira Pinto, M. de, et al. (2018). Improving accuracies of genomic predictions for drought tolerance in maize by joint modeling of additive and dominance effects in multi-environment trials. Heredity, 121(1), 24–37.
Dias, K. O. G., Santos, J. P. R., Krause, M. D., Piepho, H.-P., Guimarães, L. J. M., Pastina, M. M., & Garcia, A. A. F. (2022). Leveraging probability concepts for cultivar recommendation in multi-environment trials. Theoretical and Applied Genetics, 135(4), 1385–1399. https://doi.org/10.1007/s00122-022-04041-y
Eberhart, S. A., & Russell, W. A. (1966). Stability parameters for comparing varieties. Crop Science, 6(1), cropsci1966.0011183X000600010011x. https://doi.org/10.2135/cropsci1966.0011183X000600010011x
Eskridge, K. M. (1990). Selection of stable cultivars using a safety-first rule. Crop Science, 30(2), 369. https://doi.org/10.2135/cropsci1990.0011183X003000020025x
Finlay, K. W., & Wilkinson, G. N. (1963). The analysis of adaptation in a plant-breeding programme. Australian Journal of Agricultural Research, 14(6), 742. https://doi.org/10.1071/AR9630742
Lin, C. S., & Binns, M. R. (1988). A superiority measure of cultivar performance for cultivar × location data. Canadian Journal of Plant Science, 68(1), 193–198. https://doi.org/10.4141/cjps88-018
Shukla, G. K. (1972). Some statistical aspects of partitioning genotype-environmental components of variability. Heredity, 29(2), 237–245. https://doi.org/10.1038/hdy.1972.87
Souza, V. F. de, Ribeiro, P. C. de O., Vieira Junior, I. C., Oliveira, I. C. M., Damasceno, C. M. B., Schaffert, R. E., Parrella, R. A. da C., Dias, K. O. das G., & Pastina, M. M. (2021). Exploring genotype\(\times\) environment interaction in sweet sorghum under tropical environments. Agronomy Journal, 113(4), 3005–3018.
Wricke, G. (1965). Zur berechnung der okovalenz bei sommerweizen und hafer. Pflanzenzuchtg, 52, 127–138.
Yan, W., Hunt, L. a., Sheng, Q., & Szlavnics, Z. (2000). Cultivar evaluation and mega-environment investigation based on the GGE Biplot. Crop Science, 40(3), 597–605. https://doi.org/10.2135/cropsci2000.403597x
Zobel, R. W., Wright, M. J., & Gauch Jr., H. G. (1988). Statistical analysis of a yield trial. Agronomy Journal, 80(3), 388–393. https://doi.org/10.2134/agronj1988.00021962008000030002x